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ABSTRACT 

With the development of Internet culture, cute has become 
a popular concept. Many people are curious about what 
factors making a person look cute. However, there is rare 
research to answer this interesting question. In this work, we 
construct a dataset of personal images with comprehensively 
annotated cuteness scores and facial attributes to investi¬ 
gate this high-level concept in depth. Based on this dataset, 
through an automatic attributes mining process, we hnd sev¬ 
eral critical attributes determining the cuteness of a person. 
We also develop a novel Continuous Latent Support Vector 
Machine (C-LSVM) method to predict the cuteness score of 
one person given only his image. Extensive evaluations val¬ 
idate the effectiveness of the proposed method for cuteness 
prediction. 
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I. INTRODUCTION 

Cuteness describes a type of attractiveness commonly as¬ 
sociated with youth and appearance, which activates in oth¬ 
ers the motivation to care [^. Recent studies suggest that 
cute images stimulate the pleasure centers of the brain which 
is closely related with the positive emotion of human [^. 
This explains why everybody prefers cute persons or stuff 
in social network, shopping, browsing images/videos on the 
web and so on. Eor example, some survey shows that women’s 
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Figure 1: The framework of cuteness prediction. 
For each input image, we first detect the bound¬ 
ing box of the human faces and neighboring body 
parts, and then extract the appearance features ac¬ 
cordingly. Based on the appearance features, we 
infer the underlying middle-level attributes and the 
cuteness score simultaneously. 


fashions opt for the cute even over the sensible or glam- 
orou^ This makes cuteness be a quite important factor to 
consider in product design, advertisement and so on. 

Cuteness has received the attention of psychologists and 
neuroscientists for several years. For example, Kim et al. 
conducted studies on why humans think certain animals are 
cute using functional magnetic resonance imaging (fMRI) 
to measure changes in human brain activity. Some evidence 
from this work suggests the brain activity is greater when 
the stimulus has juvenile characteristics - a button nose, big 
eyes, a large wobbly head, fat cheeks, etc. 

Though it has been investigated by psychologists and neu¬ 
roscientists, cuteness has not roused the attention of com¬ 
puter scientists yet. In computer vision and multimedia 
files, there are plenty of works focusing on recognizing or¬ 
dinary expressions, z.e., happiness and sadness. However, 
cuteness, which is beyond these traditional expressions and 
has higher-level semantics, is far more difficult to recognize. 
In this work, we explore the secrets of cuteness through the 
application of machine learning techniques. This is the first 


^http://www.nytimes.com/2006/01/03/science/03cute.html? 
pagewanted=all 
































research attempt of computational analysis on what factors 
determine the cuteness, a high-level concept. We construct 
a model that learns from human images and their respec¬ 
tive cuteness ratings to produce human-like evaluation of 
cuteness. Our work is based on the underlying theory that 
there are objective regularities in cuteness to be analyzed 
and learned. And in this work, we also provide a general 
framework for investigating and analyzing other high-level 
expressions, such as “funny” and “scary”. 

To investigate the factors determining the cuteness of a 
person, we construct a large dataset of human images with 
comprehensively annotated attributes and cuteness scores. 
In this work, the attributes are defined empirically and used 
as the middle-level descriptors of certain characteristic of 
the persons. For example, some of the attributes describe 
the facial appearance such as skin smoothness and age. And 
others describe the pose or expression of the human, such 
as smile and face cover. Based on this dataset, we propose 
a novel model to automatically learn which features and 
attributes determine the cuteness of persons. And we train 
the predictors on these features and attributes for predicting 
the cuteness score of a new person image. In previous works, 
latent SVM is a widely adopted method for the attributes 
mining and prediction [^[^. However, for the cuteness pre¬ 
diction problem, the annotations of the samples (cuteness 
scores) are continuous values. Thus, the traditional latent 
SVM, which can only handle discrete annotations, cannot 
be applied here. In this work, we propose a novel Continu¬ 
ous Latent SVM (C-LSVM) method, which can handle the 
continuous labels of the samples, to solve this issue. And 
we show that C-LSVM is a more general method than stan¬ 
dard Latent SVM and has a great potential for solving many 
other problems involving predicting continuous variables. 

Studying what makes person looks cute and how to predict 
cuteness from person images alone may have many useful ap¬ 
plications. These applications include choosing a collection 
of cute clips from a video to generate the attractive video 
summarization, organizing photos in an album according to 
the cuteness, automatically retrieve the cute images in a 
large image set. Cuteness prediction and generation also 
benefit greatly the advertisement and production design. 

2. DATASET CONSTRUCTION 

Since none datasets exist for the cuteness research, in or¬ 
der to investigate this problem well, we collect a new dataset 
of personal images by ourselves. The images are crawled 
from the web and 4, 800 images are collected in total. Most 
of these images contain the frontal face of the persons. 

We invite 40 subjects to participate the annotation of 
the images’ scores and attributes. To relieve the burden 
of the cuteness score annotation, in this work, we adopt a 
/c-wise comparison to estimate the rank of the images and 
then automatically infer their absolute scores [^ |^ . In each 
round of annotation, the subjects are required to rank k 
images in a descend order of their scores. After they fin¬ 
ish all the annotations, we estimate the absolute score of 
the cuteness based a rank SVM method, which maintains 
the rank of the photos annotated in each /c-wise annota¬ 
tion. For the details please refer to [^ [^. The cuteness 
degree of a person heavily depends on his appearance and 
pose. For example, a young and pretty girl pouting her 
mouth will look quite cute. In this work, we define fol¬ 
lowing 19 attributes to comprehensively describe the ap- 
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Figure 2: Examples of the collected photos with an¬ 
notated attributes and cuteness scores. 


pearance and pose of a person in a middle semantic level. 
The adopted attributes include Gender (Male, Female), Age 
(Young, Teen, Middle, Old), Eye (Open, Close), Mouth 
Variation, Mouth (Open, Close), Teeth (Visible, Invisible), 
Smile, Wearing Glasses, Beard, Skin Color (Bright, Dark), 
Hair Color (Black, Blonde, Other), Hair Ornaments, Face 
Cover, Skin Smoothness. For the attribute of mouth vari¬ 
ation, we consider the poses such as pout, muster cheek 
having the value of 1. For the attribute of face cover, it in¬ 
cludes finger touches lips, hand holds jaw and so on. Some 
examples of the above attributes are shown in Figure and 
Figure Most of the attributes admit two values and are 
represented by a binary variable. 

3. C-LSVM FOR CUTENESS PREDICTION 

3.1 Features 

We flrst run Viola-Jones face detector on the collected 
images to obtain bounding boxes for the human faces. Then 
we extract the appearance features from the face bounding 
box and the four spatially neighboring bounding boxes with 
the same size as shown in Figure Here all of the bound¬ 
ing boxes are resized to 128 x 128 pixels. We extract Gabor, 
LBP [3, HOG E features within the bounding boxes to 
describe the pose and facial texture of persons. More specif¬ 
ically, for the Gabor feature, we extract the filter responses 
in 5 scales and 8 orientations and form al28xl28x5x8 
feature vector for each bounding box. Then its feature di¬ 
mension is reduced to 200 dimensions via PCA. Similarly, 
we extract HOG feature and reduce its dimensionality from 
8, 000 to 200 and LBP feature from 256 to 40. We experi¬ 
mentally find that reducing the feature dimensionality can 
enhance both the prediction performance and efficiency. 

3.2 Prediction Model 

We propose a continuous latent support vector machine 
(G-LSVM) for modeling the relationship between the raw 
feature with the attributes and cuteness score. In particu¬ 
lar, we use three linear models to describe the relationship 
between raw feature x and attribute a, raw feature x and 
cuteness score y, attributes a and cuteness score y respec¬ 
tively. For an image A, we extract the aforementioned fea¬ 
tures Xi. The relationship of the attributes and cuteness 


score with the raw feature are modeled as follows: 

Vi = 

+ bx,a-, 

Vi — ^a,y^i ^a,y 

Here \Vx,y,bx,y,Wx,a,bx,a,^a,y,ba,y are the parameters of 
the linear prediction models, which will be determined in the 
model learning process. In the proposed C-LSVM method, 
the cuteness score of an image is inferred by maximizing the 
following fitness function, 

= -l3l{w'^^yXi + b:c,y -yf - I32{v^l,ya.+ ba,y-yf 
-||A(M/j;,Xi + - a)f + a^(P (g) M)a. ( 1 ) 

The fitness function is a linear combination of the follow¬ 
ing three types of fitness, z.e., score prediction from raw 
feature, score prediction from attributes, the attributes pre¬ 
diction from raw feature. And the last term accounts for 
the attributes correlation, which encourages the correlated 
attributes to be predicted simultaneously and can help im¬ 
prove the attributes prediction accuracy. The matrix M, 
constructed based on the statistics from the training data, 
is of size nxn and accounts for the attributes co-occurrence. 

Besides the three linear models, there are also parameters 
trading-off the cost terms including /3i,/32,A,P. A is a di¬ 
agonal matrix with the size of n x n. The zth element in the 
diagonal weights the prediction cost for the ith attribute, 
which in fact reflects the importance of the zth attribute in 
the cuteness score prediction. And P is a matrix with the 
same size as M, which weights the co-occurrence of each 
pair of attributes. All of these parameters, including 
/32, A and P need to be determined in the learning process. 
Here we also employ a latent max-margin framework for the 
parameter learning and the details are provided in the fol¬ 
lowing subsections. 


3.3 Model Learning 

In the learning process, we construct the following three 
prediction functions: the first one predicts the cuteness score 
from the raw feature, Wx,y : x ^ y; the second one predicts 
the attributes from the raw feature, Wx,a : x ^ a; the 
third one predicts the cuteness score from the attributes, 
Wa,y : a. ^ y. We adopt a max-margin regression scheme to 
learn the above three prediction functions individually. 

^ m 

min 2 f + + S) 

i = l 

S.t. Vi - {^x,y,Xi) -bx,y <€-\-^i (2) 

{^x,y, Xi) + bx,y -yi<£ + G 

>0 

After optimizing the above objective function via off-the- 
shelf solvers, we can obtain the parameters Wx,y, bx,y of the 
prediction function (px,^ The other two prediction func¬ 
tions, Wx,a, bx,a,^a,y, ba,y, Can bc obtained in a similar way. 

For the parameters optimization, including /3i,/32,A,P, 
we propose the following C-LSVM method. The objective 


function is defined as: 

m 

mm|||z||^ + y]Ci 

i=l 

S.t. maxz^0(xi,a,yi) > maxz^0(xi,a,y) + A(yi,y) - 0, 

aES aES 

yeyi 

where the set is defined as = {y\y < yi — e} U {y\y > 
yi + e} containing all of the incorrect cuteness scores, e is 
a parameter to control the tolerance of the prediction error 
and is set as 0.5 throughout the experiments. And the set 
A is defined as A = {a |0 < at < l,Vz = l,...,n}. Here 
the parameter z is the concatenation of /3i,/32,A,P. The 
potential function 0(xi,a, y) is defined as: 

<f){xi,a.,y) = [^a:,y(xi,i/);0a,!/(a,y);^j;,o(xi,a);(?!>o,o(a,a)]. 

In particular, the contained four potential functions are de¬ 
fined as follows: 


4^x,y{p^i-i y) 

4^a,y{^-) y) 


-{^l^yXi + bx,y-yf, 

-(wr,ya + 5a,y -yf^ 

+ bx,a - a) (g) {Wx^a^i + \^x,a “ a), 
a^P (g) Ma. 


The above optimization problem is equivalent to minimizing 
the following loss function: 

Az) = |||z|r + i?(z), (3) 

where 


P(z) = maxz^0(xi,a, y)+A(yi,y)-maxz^0(xi,a,yi) (4) 

aES aES 

yeyi 

The subgradient for the above function can be calculated as 
follows, 

^L(z) = z + (x, y); 0 a, 2 /(a*, y); a*); m (g) a] 

-[<t>x,y{yi,y)-, (l>a,y{a*,y); <?!>x,a(x, a*); m ® a]. 

Here a* and a* are obtained from solving the first and second 
maximizing problems in Q respectively. Note that the opti¬ 
mization problems are standard quadratic programming and 
can be solved efficiently. Here m is the long vector formed 
by stacking the column vectors of matrix M. a and a are 
formed by stacking the column of matrices a*a*^ and a*a*^. 
In the optimization, we alternatively solve the problems § 
and In particular, the problem § can be solved via 
standard SVR solver and we can obtain the individual pre¬ 
diction model. Then the problem is solved via standard 
quadratic programming and the estimation of the underly¬ 
ing attributes are updated. This procedure is repeated until 
convergence. 


3.4 Inference 

After learning the prediction functions and the model pa¬ 
rameters, for a new image, its cuteness score can be inferred 
as follows, 

{a,y} = argmaxz^(?!>(x, a,y). 
a,y 

In solving the above optimization problem, we first infer the 
attributes confidence vector a. Then we binarize the ele¬ 
ments in a via a fixed threshold 0.^ After determining the 

^It is widely used in applying regression for classification. 



Table 1: MAE of the cuteness score prediction. 


Method 

NN 

F-S 

F-A-S 
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1.92 
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1.27 


attributes, we then infer the cuteness score y via solving a 
standard quadratic programming problem. This optimiza¬ 
tion procedure is inspired by the fact that a sparse attribute 
vector a generally produces better prediction result of the 
cuteness score. The rationale lies on that in the learning 
process, the attributes annotation associated with each sam¬ 
ple are binary values. Thus the learned model prefers such 
sparse input attribute vectors. 

4. EXPERIMENTS 

In this section, we evaluate the proposed C-LSVM cute¬ 
ness score prediction on the constructed dataset. In the ex¬ 
periments, 3, 000 images from the dataset are used for train¬ 
ing. We compare the C-LSVM method with the following 
three methods: the hrst one is based on constructing a feed 
forward neural network, which has been successfully applied 
in the beauty prediction [^. The second one is to directly 
predict the cuteness score from the raw feature, where the 
prediction is based on a support vector regression model; 
and the third one is to predict the attributes at hrst and 
then predict the score from the estimated attributes based 
on two individual SVR models. 

The accuracy of the cuteness score prediction is measured 
by the mean absolute error (MAE) in the evaluation. Note 
that the groundtruth score is ranged from 0 to 10. The eval¬ 
uation results are presented in the Table 1. From the results, 
we can observe that the neural network method performs 
worst. Introducing the attributes will improve the results 
over only using raw feature by 0.14. And our proposed C- 
LSVM can further reduce the MAE by 0.08 and achieves 
the best result. Note that the MAE for the last three meth¬ 
ods are quite small and such improvement is in fact signif¬ 
icant. We also present some attributes inference results for 
the test samples in Figure We can see that most of the 
attributes can be correctly inferred. While for the cheek 
smoothness, the accuracy is relatively low. The reason may 
be that the low-level feature we adopted are describing the 
whole face region, instead of only describing the cheek re¬ 
gion. Thus some noise may contaminate the prediction for 
the smoothness. To more intuitively show how the dehned 
attributes determine the cuteness of one person, we visualize 
the learned inference model Wa,^ in Figure We observe 
that the attribute cheek smoothness is most important for 
the cuteness of one person. And the attributes age (young) 
and skin color are also important. Meanwhile, the age (old) 
and gender attributes are least important for the cuteness. 

5. CONCLUSIONS 

In this work, we present the hrst research attempt of com¬ 
putational analysis on the cuteness, which is beyond the 
ordinary expressions such as happiness or sadness. We con¬ 
struct a large dataset of persons’ images with well anno¬ 
tated cuteness scores and attributes. We propose the novel 
C-LSVM method which automatically mines the important 
features and attributes determining the cuteness of a person. 
Extensive evaluations show that our method can better cap¬ 
ture the relationship between the raw feature, attributes and 
the cuteness score, over the traditional linear predictors. 
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Figure 3: Examples of the inferred attributes. Most 
of the attributes can be inferred correctly. And the 
incorrect inferred attributes are highlighted by red 
color. 
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Figure 4: The mined weights of the attributes for 
cuteness prediction. The horizontal axis displays the 
attributes and the vertical axis shows the learned 
weights. 
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